OpenAI need copyrighted materials to train their generative AI

OpenAI is currently dealing with several legal challenges regarding how it uses copyrighted material, including articles, books, and art, to train its generative artificial intelligence (AI) tools.

OpenAI, the company that created the artificial intelligence (AI) ChatGPT chatbot, says it’s tough to teach their AI tools without using stuff that has copyrights.

Why is this a big deal?

Well, OpenAI is currently dealing with a bunch of lawsuits because they used copyrighted things like articles, books, and art to teach ChatGPT.

And guess what? Other companies in the AI world are facing similar legal troubles.

Now, let’s talk about how these Generative AI tools learn. They go through a lot of stuff on the Internet. It’s like their school, helping them understand and learn how to generate new content that looks and sounds like human-like content.”

OpenAI pointed out that it’s almost impossible to train the most advanced AI models today without using materials covered by copyright.

They shared this argument in written evidence submitted to the UK House of Lords last month, stating that copyright now applies to almost every type of human expression, such as blog posts, photos, forum entries, bits of software code, and government documents.

Suggested Read: ChatGPT for Beginners in Hindi Crash Course

The Telegraph, a British newspaper, initially covered the company’s response during an investigation into large language models (LLMs). OpenAI argued that if it only used public domain content for training, it wouldn’t create AI systems that meet the needs of today’s citizens.

They added that, even though the company believes copyright law doesn’t prevent training, they acknowledge they still need to do work in supporting and empowering creators.

ChatGPT, released in November 2022, has played a significant role in speeding up the progress of AI tools due to its increased popularity over the past year.

OpenAI reacts to New York Times lawsuit

The New York Times recently filed a lawsuit against OpenAI, claiming copyright infringement. The Times argued that the AI company owed them “billions of dollars in statutory and actual damages.”

In the blog post, OpenAI also argued that it provides a simple opt-out option to prevent it from accessing publishers’ websites. The company highlighted that it sees the memorization and repetition of training content as a ‘failure’ of the system, which aims to apply concepts to ‘new problems.

OpenAI responded this week through a separate blog post addressing the lawsuit from the US newspaper, asserting that training AI models with material available on the internet falls under “fair use,” and the case from The New York Times was “without merit.”

The detailed 69-page lawsuit claims that OpenAI improperly utilized The New York Times content to develop AI systems that would rival media companies.

According to the lawsuit, OpenAI’s tools produce “output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples.

An example in the lawsuit presents a text from GPT-4 that closely mirrors a Pulitzer Prize-winning 2019 investigation by The New York Time

The company mentioned its efforts to establish partnerships with news organizations to “create mutually beneficial opportunities” and emphasized that news media represents only a “tiny slice” of the content used to train the AI systems.

OpenAI has entered into agreements with media companies such as the Associated Press and Axel Springer, the owner of media outlets like Politico, Business Insider, Bild, and Welt, to license their content for training.

Also Read: Best 20 ChatGPT Prompts for All Your Work Needs

Stefan

I’m a journalist who enjoys discovering and sharing stories that often go unnoticed. I strongly believe in gaining insights into the complexities of the social world.